Model Selection

ViT architecture

# ViT architecture

Ade20k Panoptic Eomt Large 640

This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.

Image Segmentation

Ade20k Panoptic Eomt Giant 640

This model reveals the potential of Vision Transformer (ViT) in image segmentation tasks by adapting its architecture specifically for segmentation.

Image Segmentation

Vit Base Patch16 Clip 224.dfn2b

Vision Transformer model based on CLIP architecture, featuring DFN2B-CLIP image encoder weights released by Apple

Image Classification

Llm Jp Clip Vit Base Patch16

Japanese CLIP model trained on OpenCLIP framework, supporting zero-shot image classification tasks

Text-to-Image Japanese

Vit Base Patch16 Clip 224.laion400m E31

Vision Transformer model trained on LAION-400M dataset, supporting zero-shot image classification tasks

Image Classification

Vit Base Patch32 Clip 224.laion400m E32

Vision Transformer model trained on LAION-400M dataset, compatible with both OpenCLIP and timm frameworks

Image Classification

Vit Facial Expression Recognition

A facial expression recognition model based on ViT architecture, fine-tuned on the imagefolder dataset with an accuracy of 91.77%

Vit Base Violence Detection

A violence detection model optimized based on the Vision Transformer (ViT) architecture, capable of classifying images into violent or non-violent scenes.

Image Classification

Transformers English

Vit Facial Expression Recognition

A ViT-based facial expression recognition model fine-tuned on FER2013, MMI, and AffectNet datasets, capable of recognizing seven basic emotions

AI VS REAL IMAGE DETECTION

An image classification model fine-tuned based on Google Vision Transformer (ViT) architecture, used to distinguish AI-generated images from real images

Image Classification

Vit Base Nsfw Detector

An image classification model based on Vision Transformer (ViT) architecture, specifically designed to detect whether images contain NSFW (Not Safe For Work) content.

Image Classification

Vitforimageclassification

This model is a fine-tuned image classification model based on google/vit-base-patch16-224-in21k on the CIFAR10 dataset, achieving an accuracy of 96.78%.

Image Classification

Vit Finetuned Vanilla Cifar10 0

An image classification model fine-tuned on the CIFAR-10 dataset based on the Vision Transformer (ViT) architecture, achieving 99.2% accuracy

Image Classification

Phikon is a self-supervised learning model for histopathology based on iBOT training, primarily used for extracting features from histology image patches.

Image Classification

Transformers English

A small-scale vision Transformer model trained using the DINOv2 method, extracting image features through self-supervised learning

Image Classification

SAM is a vision model capable of generating high-quality object masks from input prompts (such as points or boxes), supporting zero-shot segmentation tasks

Image Segmentation

Transformers Other

Clasificacion Vit Model Manuel Chaves

An image classification model fine-tuned from google/vit-base-patch16-224-in21k, achieving 97.74% accuracy on the bean dataset

Image Classification

Vit Base Railspace

A Vision Transformer model fine-tuned from google/vit-base-patch16-224-in21k, achieving 99.26% accuracy on the evaluation set

Image Classification

VIT Food101 Image Classifier

Food image classification model based on Vision Transformer architecture, trained on the Food101 dataset with an accuracy of 93.3%

Image Classification

Vit Base Patch16 224 In21k Lcbsi

A fine-tuned model based on Google Vision Transformer (ViT) architecture, suitable for image classification tasks

Image Classification

Vit Base Patch16 224 In21k Ft Cifar10test

A visual classification model fine-tuned on the CIFAR-10 test set based on the Google Vision Transformer (ViT) model

Image Classification

Vit Base Patch16 224 Finetuned Cifar10

This is an image classification model based on the Vision Transformer (ViT) architecture, fine-tuned on the CIFAR10 dataset, achieving 98.76% accuracy.

Image Classification

Vit Base Patch32 224 In21 Leicester Binary

A binary image classification model based on the Google Vision Transformer (ViT) architecture, fine-tuned on a specific dataset to achieve high-precision classification

Image Classification

A visual classification model fine-tuned on the bean image dataset based on Google's ViT base model, with an accuracy rate of 97.74%

Image Classification

Syn10kplusog Oct ViT Base 8Epochs V1

An image classification model based on the ViT architecture, achieving 88.67% accuracy after 8 epochs of training

Image Classification

Syn10k Oct ViT Base 8Epochs V1

An image classification model based on the ViT architecture, achieving 92.5% accuracy after 8 training epochs

Image Classification

Yolos Small Balloon

YOLOS is an object detection model using Vision Transformer (ViT) architecture, trained with DETR loss and fine-tuned on COCO and Matterport Balloon datasets.

Object Detection

Vit Base Patch16 224 In21k Finetuned Cassava3

An image classification model based on Google Vision Transformer (ViT) architecture, fine-tuned on an image folder dataset with an accuracy of 88.55%

Image Classification

Syn Oct ViT Base 4Epochs 30c V2 Run

An image classification model based on the ViT architecture, trained on an OCT image dataset with an accuracy of 86.67%

Image Classification

An image classification model fine-tuned on the MNIST dataset based on the ViT architecture, achieving an accuracy of 99.49%

Image Classification

farleyknight-org-username

Vit Base Patch16 224 In21k Finetuned Eurosat

An image classification model based on the ViT architecture, fine-tuned on the image_folder dataset with an accuracy of 90.17%

Image Classification

Vit Base Patch16 224 In21k Ucsat

Image classification model based on Vision Transformer architecture, fine-tuned on an unknown dataset

Image Classification

Garbage Classification

This is a garbage classification model based on the Vision Transformer architecture, achieving 95% test accuracy on a 6-category garbage dataset.

Image Classification

Ak Vit Base Patch16 224 In21k Image Classification

An image classification model based on Google's Vision Transformer (ViT) architecture, fine-tuned on a custom image dataset with an evaluation accuracy of 100%

Image Classification

Vit Base Patch16 224 In21k Eurosat

This model is a fine-tuned Vision Transformer based on google/vit-base-patch16-224-in21k on an unknown dataset, primarily used for image classification tasks.

Image Classification

Violation Classification Bantai Vit V100ep

A ViT-based image classification model for prohibited content recognition, achieving 91.57% accuracy on the evaluation set

Image Classification

Vit Base Patch16 224 In21k Finetuned Cifar10

A pre-trained model based on Google's Vision Transformer (ViT) architecture, fine-tuned on the CIFAR-10 dataset for image classification tasks.

Image Classification

Vision Transformer Fmri Classification Ft

fMRI image classification model based on Vision Transformer architecture, automatically generated by HuggingPics

Image Classification

shivkumarganesh

Vit Base Patch16 224 In21k Eurosat

A high-precision remote sensing image classification model fine-tuned on the EuroSAT dataset based on Google's Vision Transformer architecture

Image Classification

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase